AITopics | greedy policy

Collaborating Authors

greedy policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint

Nguyen Cuong, Huan Xu

Neural Information Processing SystemsMay-1-2026, 06:07:03 GMT

We study the worst-case adaptive optimization problem with budget constraint that is useful for modeling various practical applications in artificial intelligence and machine learning. We investigate the near-optimality of greedy algorithms for this problem with both modular and non-modular cost functions. In both cases, we prove that two simple greedy algorithms are not near-optimal but the best between them is near-optimal if the utility function satisfies pointwise submodularity and pointwise cost-sensitive submodularity respectively. This implies a combined algorithm that is near-optimal with respect to the optimal algorithm that uses half of the budget. We discuss applications of our theoretical results and also report experiments comparing the greedy algorithms on the active learning problem.

artificial intelligence, machine learning, submodularity, (12 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Appendix: On the Expressivity of Markov Reward

Neural Information Processing SystemsApr-25-2026, 14:37:29 GMT

We first address questions that might arise in response to the main text. That is, if Alice chooses a SOAP, PO, or TO for Bob to learn to solve, when can Alice determine Bob has solved the task? A: Bob can be said to be doing better on a given task if his behavior improves, as is typical in evaluating behavior under reward. The difference with SOAPs, POs, and TOs is that we measure improvement relative to the task rather than reward. For instance, given a SOAP, we might say that Bob has solved the task once he has found one of the good policies, and we might measure Bob's progress on a task in terms of the distance of his greedy policy to one of the good policies (as done in our learning experiments). The same reasoning applies to POs and TOs: Bob is doing better on a task in so far as his greedy policy (or trajectories) is (are) higher up the ordering. That is, the studied reward functions must be a function of s, (s,a), or (s,a,s0). A: Indeed, as discussed in our introduction, our goal is to examine the expressivity of Markov rewards in the context of finite MDPs.

artificial intelligence, machine learning, reward function, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)

Add feedback

114292cf3f930ba157ed33f66997fee2-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 15:48:33 GMT

artificial intelligence, machine learning, policy change, (16 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

8dbd2780192078711c0f31e10a819031-Paper-Conference.pdf

Neural Information Processing SystemsMar-14-2026, 01:37:09 GMT

algorithm, decay rate, lac condition, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

TightRegretBoundsforModel-Based Reinforcement LearningwithGreedyPolicies

Neural Information Processing SystemsFeb-11-2026, 17:45:22 GMT

The results are based on anovelanalysis ofreal-time dynamic programming, thenextended tomodel-based RL.Specifically,wegeneralize existing algorithms that perform full-planning to act by 1-step planning.

artificial intelligence, machine learning, skt, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.05)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

Neural Information Processing SystemsFeb-9-2026, 02:31:19 GMT

We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

2109737282d2c2de4fc5534be26c9bb6-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 19:34:16 GMT

bandit, dynamical system, satiation dynamic, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Neural Information Processing SystemsDec-25-2025, 03:51:07 GMT

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing full-planning on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with greedy policies -- act by 1-step planning -- can achieve tight minimax performance in terms of regret, O(\sqrt{HSAT}). Thus, full-planning in model-based RL can be avoided altogether without any performance degradation, and, by doing so, the computational complexity decreases by a factor of S. The results are based on a novel analysis of real-time dynamic programming, then extended to model-based RL. Specifically, we generalize existing algorithms that perform full-planning to such that act by 1-step planning. For these generalizations, we prove regret bounds with the same rate as their full-planning counterparts.

model-based reinforcement learning, name change, tight regret bound, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

Abouelrous, Abdo, Bliek, Laurens, Wu, Yaoxin, Zhang, Yingqian

arXiv.org Artificial IntelligenceDec-2-2025

In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakeholders. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length, although different objectives could be incorporated. Learning-based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of constructing a Pareto Front of good quality within acceptable run times compared to three baselines.

ea-cluster, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.01518

Genre: Research Report > New Finding (0.54)

Industry: Transportation (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
(2 more...)

Add feedback

Filters

Collaborating Authors

greedy policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint

Appendix: On the Expressivity of Markov Reward

114292cf3f930ba157ed33f66997fee2-Supplemental-Conference.pdf

114292cf3f930ba157ed33f66997fee2-Paper-Conference.pdf

8dbd2780192078711c0f31e10a819031-Paper-Conference.pdf

TightRegretBoundsforModel-Based Reinforcement LearningwithGreedyPolicies

ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

2109737282d2c2de4fc5534be26c9bb6-Paper.pdf

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW